Dense linear algebra libraries, such as BLAS and LAPACK, provide a relevantcollection of numerical tools for many scientific and engineering applications.While there exist high performance implementations of the BLAS (and LAPACK)functionality for many current multi-threaded architectures,the adaption ofthese libraries for asymmetric multicore processors (AMPs)is still pending. Inthis paper we address this challenge by developing an asymmetry-awareimplementation of the BLAS, based on the BLIS framework, and tailored for AMPsequipped with two types of cores: fast/power hungry versus slow/energyefficient. For this purpose, we integrate coarse-grain and fine-grainparallelization strategies into the library routines which, respectively,dynamically distribute the workload between the two core types and staticallyrepartition this work among the cores of the same type. Our results on an ARM big.LITTLE processor embedded in the Exynos 5422 SoC,using the asymmetry-aware version of the BLAS and a plain migration of thelegacy version of LAPACK, experimentally assess the benefits, limitations, andpotential of this approach.
展开▼